Introduction
During the last 30 years, the modern epidemiology has been able to identify some important drawbacks of the classic epidemiologic methods. Causal Inference (Robins et al., 2000) and the Neyma-Rubin Potential Outcomes framework (Rubin, 2011) have provided the theory and statistical methods needed to identify recurrent problems in observational epidemologic research, such as:
- non collapsibility of the odds and hazard ratios,
- impact of paradoxical effects due to conditioning on colliders,
- left truncation,
- prevalent cases,
- selection bias related with the vague understanding of the effect of time on exposure and outcome and,
- effect of time dependent confounding and mediators.
- Etc.
To control for confounding, the classical epidemilogic methods require making the assumption that the effect measure is constant across levels of confounders included in the model.
Alternatively, James Robins in 1986 demonstrated that using standardization, implemented through the use of the G-formula, allowed to obtain unconfounded marginal estimation of the causal average treatment effect (ATE) under causal nontestable assumptions (Greenland and Robins, 1986). The most commonly used estimator for a binary treatment effect is the risk difference or ATE = \(\psi(P_{0})\).
The G-Formula
\[\psi(P_{0})\,=\,\sum_{w}\,\left[\sum_{y}\,P(Y=y\mid A=1,W=w)-\,\sum_{y}\,P(Y = y\mid A=0,W=w)\right]P(W=w)\]
where,
\[P(Y = y \mid A = a, W = w)\,=\,\frac{P(W = w, A = a, Y = y)}{\sum_{y}\,P(W = w, A = a, Y = y)}\]
is the conditional probability distribution of Y = y, given A = a, W = w and,
\[P(W = w)\,=\,\sum_{y,a}\,P(W = w, A = a, Y = y)\]
Classical epidemilogic methods require making the assumption that the effect measure is constant across levels of confounders included in the model. However, Standardization allows us to obtain an unconfounded summary effect measure without requiring this assumption. The G-formula is a generalization of standardization (Greenland and Robins, 1986).
The ATE can be estimated non-parametrically using the G-formula. However, the course of dimensionality in observational studies limits its estimation.
Hence, the estimation of the ATE using the G-formula relies mostly on parametric modelling assumptions and maximum likelihood estimation. The correct model specification in parametric modelling is crucial to obtain unbiased estimates of the true ATE (Rubin, 2011).
However, Mark van der Laan and collaborators have developed a double-robust estimation procedure to reduce bias against misspecification. The targeted maximum likelihood estimation (TMLE) is a semiparametric, efficient substitution estimator (Laan and Rose, 2011).
TMLE allows for data-adaptive estimation while obtaining valid statistical inferencebased on the targeted minimum loss-based estimation and machine learning algorithms to minimize the risk of model misspecification (Laan and Rose, 2011).
TMLE is a general algorithm for the construction of double-robust, semiparametric, efficient substitution estimators. TMLE allows for data-adaptive estimation while obtaining valid statistical inference.
TMLE implemtation uses the G-computation estimand (G-formula). Briefly, the TMLE algorithm uses information in the estimated exposure mechanism P(A|W) to update the initial estimator of the conditional expectaction of the outcome given the treatment and the set of covariates W, E\(_{0}\)(Y|A,W).
The targeted estimates are then substituted into the parameter mapping \(\Psi\). The updating step achieves a targeted bias reduction for the parameter of interest \(\psi(P_{0})\) (the true target parameter) and serves to solve the efficient score equation. As a result, TMLE is a double robust estimator.
TMLE it will be consistent for \(\psi(P_{0})\) if either the conditional expectation E\(_{0}\)(Y|A,W) or the exposure mechanism P\(_{0}\)(A|W) are estimated consistently. When both functions are consistently estimated, the TMLE will be efficient in that it achieves the lowest asymptotic variance among a large class of estimators. These asymptotic properties typically translate into lower bias and variance in finite samples (Bühlmann et al., 2016).
The general formula to estimate the ATE using the TMLE method:
\[\psi TMLE,n = \Psi(Q_{n}^{*})= {\frac{1}{n}\sum_{i=1}^{n}\bar{Q}_{n}^{1}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{1}\left(0,\ W_{i}\right)}. (1)\] The efficient influcence curve (IC) based on Hampel seminal paper (Hampel, 1974) is applied for statistical inference using TMLE:
\[IC_{n}(O_{i})\ \ =\ \left(\frac{I\left(A_{i}=1\right)}{g_n\left(1\left|W_{i}\right)\right)}\ -\ \frac{I\left(A_{i}=0\right)}{g_n\left(0\left|W_{i}\right)\right)}\ \right)\left[Y_{i}-\bar{Q}_{n}^{1}\left(A_{i},W_{i}\right)\right]+\bar{Q}_{n}^{1}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{1}\left(0,\ W_{i}\right) - \psi TMLE,n. (2)\] where the variance of the ATE:
\[\sigma({\psi_{0}})=\sqrt{\frac{Var(IC_{n})}{n}}. (3)\]
The procedure is available with standard software such as the tmle package in R (Gruber and Laan, 2011).
The advantages of TMLE have been repeatedly demonstrated in both simulation studies and applied analyses (Laan and Rose, 2011). Evidence shows that TMLE provides the less unbiased ATE estimate compared with other double-robust estimators (Neugebauer and Laan, 2005), (Laan and Rose, 2011) such as the combination of regression adjustment with inverse probability of treatment weighting (IPTW-RA) and the augmented inverse probability of treatment weighting (AIPTW). The AIPTW estimation is a two step procedure with two equations (propensity score equation and mean outcome equation).
To estimate the ATE using the AIPTW estimator one can set the estimation equation (EE) (4) equal to cero and use bootstrap to derive 95% confidence intervals (CI). However, solving the EE using the generalized method of moments (GMM), stacking both equations (propensity score and outcome), reduces the estimation and inference steps to only one. However, given that the propensity score in equation (4) can easily fall outside the range [0, 1] (if for some observations \(g_{n}(1|W_{i})\) is close to 1 or 0) the AIPTW estimation can be unstable (near violation of the positivity assumption). This represents the price of not being a substitution estimator as TMLE.
\[\psi_{0}^{AIPTW-ATE}\ \ =\ \frac{1}{n}\sum_{i=1}^{n}\left(\frac{I\left(A_{i}=1\right)}{g_n\left(1\left|W_{i}\right)\right)}\ -\ \frac{I\left(A_{i}=0\right)}{g_n\left(0\left|W_{i}\right)\right)}\ \right)\left[Y_{i}-\bar{Q}_{n}^{0}\left(A_{i},W_{i}\right)\right]+\frac{1}{n}\sum_{i=1}^{n}\bar{Q}_{n}^{0}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{0}\left(0,\ W_{i}\right). (4)\]
Structural causal framework
Direct Acyclic Graph
Figure 1. Direct Acyclic Graph
Causal assumptions
Under the counterfactual framework the following assumptions have to be considered to estimate the \(\psi(P_{0})\) (ATE) with a mondel for \(P_{0}\) augmented with additional nontestatble cuasal assumptions (Rubin, 2011), (Laan and Rose, 2011):
CMI or Randomization
(\(Y_{0},Y_{1}\perp\)A|W) of the binary treatment effect (A) on the outcome (Y) given the set of observed covariates (W), where W = (W1, W2, W3, … , Wk).
Positivity
a ϵ A: P(A=a | W) > 0
P(A=1|W=w) > 0 and P(A=0| W = w) > 0 for each possible w.
Consistency or SUTVA:
The observed outcome value, under the observed treatment, is equal to the counterfactual outcome corresponding to the observed treatment for identical independent distributed (i.i.d.) variables.
TMLE flow chart
Source: Mark van der Laan and Sherri Rose. Targeted learning: causal inference for observational and experimental dataSpringer Series in Statistics, 2011.
Figure 2. TMLE flow chart (Road map)
Data generation
Simulation
In R we create a function to generate the data. The function will have as input number of draws and as output the generated observed data (ObsData) including the counterfactuals (Y1, Y0).
The simulated data replicationg the DAG in Figure 1:
- Y: mortality binary indicator (1 death, 0 alive)
- A: binary treatment for emergency presentation at cancer diagnosis (1 EP, 0 NonEP)
- W1: Gender (1 male; 0 female)
- W2: Age at diagnosis (0 <65; 1 >=65)
- W3: Cancer TNM classification (scale from 1 to 4)
- W4: Comorbidities (scale from 1 to 5)
#install.packages("broom")
options(digits=4)
generateData <- function(n){
w1 <- rbinom(n, size=1, prob=0.5)
w2 <- rbinom(n, size=1, prob=0.65)
w3 <- round(runif(n, min=0, max=4), digits=3)
w4 <- round(runif(n, min=0, max=5), digits=3)
A <- rbinom(n, size=1, prob= plogis(-0.4 + 0.2*w2 + 0.15*w3 + 0.2*w4 + 0.15*w2*w4))
Y <- rbinom(n, size=1, prob= plogis(-1 + A -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4))
# counterfactual
Y.1 <- rbinom(n, size=1, prob= plogis(-1 + 1 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4))
Y.0 <- rbinom(n, size=1, prob= plogis(-1 + 0 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4))
# return data.frame
data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
}
set.seed(7777)
ObsData <- generateData(n=10000)
True_Psi <- mean(ObsData$Y.1-ObsData$Y.0);
cat(" True_Psi:", True_Psi)
True_Psi: 0.198
Bias_Psi <- lm(data=ObsData, Y~ A)
cat("\n")
cat("\n Naive_Biased_Psi:",summary(Bias_Psi)$coef[2, 1])
Naive_Biased_Psi: 0.2631
Naive_Bias <- ((summary(Bias_Psi)$coef[2, 1])-True_Psi); cat("\n Naives bias:", Naive_Bias)
Naives bias: 0.06509
Naive_Relative_Bias <- (((summary(Bias_Psi)$coef[2, 1])-True_Psi)/True_Psi)*100; cat("\n Relative Naives bias:", Naive_Relative_Bias,"%")
Relative Naives bias: 32.88 %
Data visualization
# DT table = interactive
# install.packages("DT") # install DT first
library(DT)
datatable(head(ObsData, n = nrow(ObsData)), options = list(pageLength = 5, digits = 2))
TMLE simple implementation
Step 1: \(Q_{0}\)(A,W)
Estimation of the initial probability of the outcome (Y) given the treatment (A) and the set of covariates (W), denoted as the \(Q_{0}\)(A,W). To estimate \(Q_{0}\)(A,W) we can use a standard logistic regression model:
\[\text{logit}[P(Y=1|A,W)]\,=\,\beta_{0}\,+\,\beta_{1}A\,+\,\hat{\beta_{2}^{T}}W.\]
Therefore, we can estimate the initial probability as follows:
\[\bar{Q}^{0}(A,W)\,=\,\text{expit}(\hat{\beta_{0}}\,+\,\hat{\beta_{1}}A\,+\,\hat{\beta_{2}^{T}}W).\]
The predicted probability can be estimated using the Super Learner library implemented in the R package “Super-Learner” (Ref) to include any terms that are functions of A or W (e.g., polynomial terms of A and W, as well as the interaction terms of A and W, can be considered). Consequently, for each subject, the predicted probabilities for both potential outcomes \(\bar{Q}^{0}(0,W)\) and \(\bar{Q}^{0}(1,W)\) can be estimated by setting A = 0 and A = 1 for everyone respectively: \[\bar{Q}^{0}(0,W)\,=\,\text{expit}(\hat{\beta_{0}}\,+\,\hat{\beta_{2}^{T}}W),\] and,
\[\bar{Q}^{0}(1,W)\,=\,\text{expit}(\hat{\beta_{0}}\,+\,\hat{\beta_{1}}A\,+\,\hat{\beta_{2}^{T}}W).\]
ObsData <-subset(ObsData, select=c(w1,w2,w3,w4,A,Y))
Y <- ObsData$Y
A <- ObsData$A
w1 <- ObsData$w1
w2 <- ObsData$w2
w3 <- ObsData$w3
w4 <- ObsData$w4
m <- glm(Y ~ A + w1 + w2 + w3 + w4, family=binomial, data=ObsData)
Q <- cbind(QAW = predict(m),
Q1W = predict(m, newdata=data.frame(A = 1, w1, w2, w3, w4)),
Q0W = predict(m, newdata=data.frame(A = 0, w1, w2, w3, w4)))
Q0 <- as.data.frame(Q);mean(Q0$Q1W-Q0$Q0W)
[1] 1.026
Step 2: \(g_{0}(A,W)\)
Estimation of the probability of the treatment (A) given the set of covariates (W), denoted as \(g_{0}(A,W)\). We can use again a logistic regression model and to improve the prediction algorithm we can use the Super Learner library or any other machine learning estrategy:
\[\text{logit}[P(A=1|W)]\,=\,\beta_{0}\,+\,\beta_{1}^{T}W.\] Then, we estimate the predicted probability of P(A|W) = \(\hat{g}(1,W)\) using:
\[\hat{g}(1,W)\,=\,\text{expit}\,=\,(\hat{\beta_{0}}\,+\,\hat{\beta_{2}^{T}}W).\]
g <- glm(A ~ w2 + w3 + w4, family = binomial)
g1w = predict(g, type ="response");summary(g1w)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.358 0.594 0.681 0.671 0.759 0.875
Step 3: HAW and \(\epsilon\)
This step aims to find a better prediction model targeted at minimising mean squared error (MSE) for the potential outcomes by using the so-called efficient IC estimation equation. For the ATE on step convergence is guaranteed given \(\bar{Q}^{0}\) and \(\hat{g}(1,W)\) the fluctuating parameter is modelled using a parametric working model to estimate the fluctuation parameters \(\epsilon_{0}\) and \(\epsilon_{1}\) as follows:
\[\bar{Q^{1}}(A,W)\,=\,\text{expit}\left[\text{logit}\left(\bar{Q^{0}}(A, W)\right)\,+\,\hat{\epsilon_{0}}H_{0}(A,W)\,+\,\hat{\epsilon_{1}}H_{1}(A,W)\right] (5)\] \[\bar{Q^{1}}(0,W)\,=\,\text{expit}\left[\text{logit}\left(\bar{Q^{0}}(A,W)\right)\,+\,\hat{\epsilon_{0}}H_{0}(A,W)\right]\]
\[\bar{Q^{1}}(1,W)\,=\,\text{expit}\left[\text{logit}\left(\bar{Q^{0}}(A,W)\right)\,+\,\hat{\epsilon_{1}}H_{1}(A,W)\right]\] Where, \[H_{0}(A,W)\,=\,\frac{I(A=0)}{\hat{g}(0|W)}\;\text{and},\;H_{1}(A,W)\,=\,\frac{I(A=1)}{\hat{g}(1|W)}\] are referred to as clever covariates (note that \(\hat{g}(A|W)\) is estimted from step 2).
The fluctuation parameters \((\hat{\epsilon}_{0}\,,\,\hat{\epsilon}_{1})\) are estimated using maximum likelihood procedures by setting \(\text{logit}(\bar{Q^{0}}(A,W))\) as an offset in a intercept-free logistic regression with \(H_{0}\) and \(H_{1}\) as independent variables. Then, the estimated probability of the potential outcomes is updated by the substitution parameters \((\hat{\epsilon_{0}}\,,\,\hat{\epsilon_{1}})\). The substitution update is performed by setting A = 0 and A = 1 for each subject in the initial estimate probability of the potential outcomes \(\bar{Q^{1}}(0,W)\,,\,\bar{Q^{1}}(1,W)\), as well as in the clever covariates \(H_{0}(0,W)\;\text{and}\; H_{1}(1,W)\).
For the ATE, the updated estimate of the potential outcomes only needs one iteration \(\Psi(\bar{Q_{n}}^{*})\) from \(\bar{Q}^{0}(A,W)\,=>\bar{Q^{1}}(A,W)\). Therefore, model (5) targets \(E[\hat{Y}_{A=0}]\;\text{and}\; E[\hat{Y}_{A=1}]\) simultaneously by including both \(H_{0}(A,W)\,\text{and}\,H_{1}(A,W)\) in the model.
#Model 5: Clever covariate and fluctuating/substitution paramteres
h <- cbind(A/g1w -(1-A)/(1-g1w), 1/g1w, -1/(1-g1w))
epsilon <- coef(glm(Y ~ -1 + h[,1] + offset(Q[,"QAW"]), family = binomial));epsilon
h[, 1]
0.001189
Step 4: \(\bar{Q_{n}}^{*}\)
\[\psi TMLE,n = \Psi(Q_{n}^{*})= {\frac{1}{n}\sum_{i=1}^{n}\bar{Q}_{n}^{1}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{1}\left(0,\ W_{i}\right)}. (1)\]
Qstar <- plogis(Q + epsilon*h)
Psi <- mean(Qstar[,"Q1W"] - Qstar[,"Q0W"]);cat("TMLE_Psi:", Psi)
TMLE_Psi: 0.2004
cat("\n TMLE.SI_bias:", abs(True_Psi-Psi))
TMLE.SI_bias: 0.002383
cat("\n Relative_TMLE.SI_bias:",abs(True_Psi-Psi)/True_Psi*100,"%")
Relative_TMLE.SI_bias: 1.204 %
Step 5: Inference
\[IC_{n}(O_{i})\ \ =\ \left(\frac{I\left(A_{i}=1\right)}{g_n\left(1\left|W_{i}\right)\right)}\ -\ \frac{I\left(A_{i}=0\right)}{g_n\left(0\left|W_{i}\right)\right)}\ \right)\left[Y_{i}-\bar{Q}_{n}^{1}\left(A_{i},W_{i}\right)\right]+\bar{Q}_{n}^{1}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{1}\left(0,\ W_{i}\right) - \psi TMLE,n. (2)\]
where the variance of the ATE:
\[\sigma({\psi_{0}})=\sqrt{\frac{Var(IC_{n})}{n}}. (3)\]
Q <- as.data.frame(Q)
IC <- h[,1]*(Y-Q$QAW) + Q$Q1W - Q$Q0W - Psi;summary(IC)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.950 -0.855 -0.200 -0.006 0.570 14.300
n <- nrow(ObsData)
varHat.IC <- var(IC)/n; varHat.IC
[1] 0.0002024
#Psi and 95%CI for Psi
cat("\n TMLE.SI_bias:", abs(True_Psi-Psi))
TMLE.SI_bias: 0.002383
cat("\n 95%CI:", c(Psi-1.96*sqrt(varHat.IC), Psi+1.96*sqrt(varHat.IC)))
95%CI: 0.1725 0.2283
cat("\n Relative_TMLE.SI_bias:",abs(True_Psi-Psi)/True_Psi*100,"%")
Relative_TMLE.SI_bias: 1.204 %
Comparison with AIPTW
\[\psi_{0}^{AIPTW-ATE}\ \ =\ \frac{1}{n}\sum_{i=1}^{n}\left(\frac{I\left(A_{i}=1\right)}{g_n\left(1\left|W_{i}\right)\right)}\ -\ \frac{I\left(A_{i}=0\right)}{g_n\left(0\left|W_{i}\right)\right)}\ \right)\left[Y_{i}-\bar{Q}_{n}^{0}\left(A_{i},W_{i}\right)\right]+\frac{1}{n}\sum_{i=1}^{n}\bar{Q}_{n}^{0}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{0}\left(0,\ W_{i}\right). (4)\]
AIPTW <- mean((h[,1]*(Y-Q$QAW))+(Q$Q1W-Q$Q0W)); AIPTW
[1] 0.1948
cat("\n AIPTW_bias:", abs(True_Psi-AIPTW))
AIPTW_bias: 0.003211
cat("\n Relative_AIPTW_bias:",abs(True_Psi-AIPTW)/True_Psi*100,"%")
Relative_AIPTW_bias: 1.622 %
TMLE using the Super-Learner
#Q0
library(SuperLearner)
Loading required package: nnls
Super Learner
Version: 2.0-19
Package created on 2016-02-02
#Specify SuperLearner libraries
SL.library <- c("SL.glm","SL.step","SL.glm.interaction")
#Data frame with X with baseline covariates and exposure
X <- subset(ObsData, select=c(A, w1, w2, w3, w4))
n <- nrow(ObsData)
#Create data frames with A=1 and A=0
X1<-X0<-X
X1$A <-1
X0$A <-0
#Create new data by stacking
newdata <- rbind(X,X1,X0)
#Call superlearner
Qinit <- SuperLearner(Y=ObsData$Y, X=X, newX=newdata, SL.library=SL.library, family="binomial")
Qinit
Call:
SuperLearner(Y = ObsData$Y, X = X, newX = newdata, family = "binomial",
SL.library = SL.library)
Risk Coef
SL.glm_All 0.1766 0.6002
SL.step_All 0.1766 0.0000
SL.glm.interaction_All 0.1767 0.3998
#Predictions
#Pred prob of survival given A, W
QbarAW <- Qinit$SL.predict[1:n]
#Pred prob of surv for each subject given A=1 and w
Qbar1W <- Qinit$SL.predict[(n+1):(2*n)]
#Pred prob of surv for each subject given A=0 and w
Qbar0W <- Qinit$SL.predict[(2*n+1):(3*n)]
#Simple substitution estimator Psi(Q0)
PsiHat.SS <- mean(Qbar1W-Qbar0W);PsiHat.SS
[1] 0.199
#Step 2 g_0(A|W) with SuperLearner
w <- subset(ObsData, select=c(w1,w2,w3,w4))
gHatSL <- SuperLearner(Y=ObsData$A, X=w, SL.library=SL.library, family = binomial)
gHatSL;mean(gHatSL)
Call:
SuperLearner(Y = ObsData$A, X = w, family = binomial, SL.library = SL.library)
Risk Coef
SL.glm_All 0.2091 0.0000
SL.step_All 0.2091 0.3803
SL.glm.interaction_All 0.2090 0.6197
argument is not numeric or logical: returning NA
[1] NA
#Generate the pred prob of A=1 and, A=0 given covariates
gHat1W <- gHatSL$SL.predict
gHat0W <- 1-gHat1W
#Step 3: Clever covariate
HAW <- as.numeric(ObsData$A==1)/gHat1W - as.numeric(ObsData$A==0)/gHat0W;mean(HAW)
[1] 0.002954
H1W <- 1/gHat1W
H0W <- -1/gHat0W
#Step 4: Substitution estimaiton Q* of the ATE.
logitUpdate <- glm(ObsData$Y ~ -1 + offset(qlogis(QbarAW))+HAW, family='binomial')
eps <- logitUpdate$coef;eps
HAW
0.0004483
#Calculating the predicted values for each subject under each txt
QbarAW.star <- plogis(qlogis(QbarAW)+eps*HAW)
Qbar1W.star <- plogis(qlogis(Qbar1W)+eps*H1W)
Qbar0W.star <- plogis(qlogis(Qbar0W)+eps*H0W)
PsiHat.TMLE.SL <- mean(Qbar1W.star) - mean(Qbar0W.star)
cat("PsiHat.TMLE.SL:", PsiHat.TMLE.SL)
PsiHat.TMLE.SL: 0.1995
cat("\n PsiHat.TMLE.SL_bias:", abs(True_Psi-PsiHat.TMLE.SL))
PsiHat.TMLE.SL_bias: 0.001456
cat("\n Relative_PsiHat.TMLE.SL_bias:",abs(True_Psi-PsiHat.TMLE.SL)/True_Psi*100,"%")
Relative_PsiHat.TMLE.SL_bias: 0.7354 %
R-TMLE
library(tmle)
Welcome to the tmle package, version 1.2.0-2
Use tmleNews() to see details on changes and bug fixes
Attaching package: ‘tmle’
The following object is masked from ‘package:SuperLearner’:
SL.glm.interaction
w <- subset(ObsData, select=c(w1,w2,w3,w4))
tmle <- tmle(Y, A, W=w)
cat("TMLER_Psi:", tmle$estimates[[2]][[1]],";","95%CI(", tmle$estimates[[2]][[3]],")")
TMLER_Psi: 0.1994 ; 95%CI( 0.1799 0.2189 )
cat("\n TMLE_bias:", abs(True_Psi-tmle$estimates[[2]][[1]]))
TMLE_bias: 0.001407
cat("\n Relative_TMLE_bias:",abs(True_Psi-tmle$estimates[[2]][[1]])/True_Psi*100,"%")
Relative_TMLE_bias: 0.7108 %
R-TMLE improving prediction
SL.TMLER.Psi <- tmle(Y=Y, A=A, W=w, family="binomial",
Q.SL.library = c("SL.glm", "SL.glm.interaction", "SL.gam", "SL.randomForest"),
g.SL.library = c("SL.glm", "SL.glm.interaction", "SL.gam", "SL.randomForest"))
cat("SL.TMLER.Psi:", SL.TMLER.Psi$estimates[[2]][[1]],";","95%CI(", SL.TMLER.Psi$estimates[[2]][[3]],")")
SL.TMLER.Psi: 0.1994 ; 95%CI( 0.1799 0.2188 )
cat("\n SL.TMLER.Psi_bias:", abs(True_Psi-SL.TMLER.Psi$estimates[[2]][[1]]))
SL.TMLER.Psi_bias: 0.001373
cat("\n Relative_SL.TMLER.Psi_bias:",abs(True_Psi-SL.TMLER.Psi$estimates[[2]][[1]])/True_Psi*100,"%")
Relative_SL.TMLER.Psi_bias: 0.6935 %
Thank you
Thank you for participating in this tutorial.
If you have updates or changes that you would like to make, please send me a pull request. Alternatively, if you have any questions, please e-mail me.
Miguel Angel Luque Fernandez
E-mail: miguel-angel.luque at lshtm.ac.uk
Twitter @WATZILEI
Session Info
devtools::session_info()
Session info --------------------------------------------------------------
setting value
version R version 3.3.0 (2016-05-03)
system x86_64, darwin13.4.0
ui RStudio (1.0.31)
language (EN)
collate en_US.UTF-8
tz Europe/London
date 2016-10-24
Packages ------------------------------------------------------------------
package * version date source
assertthat 0.1 2013-12-06 CRAN (R 3.3.0)
base64enc 0.1-3 2015-07-28 CRAN (R 3.3.0)
codetools 0.2-14 2015-07-15 CRAN (R 3.3.0)
devtools 1.12.0 2016-06-24 CRAN (R 3.3.0)
digest 0.6.10 2016-08-02 CRAN (R 3.3.0)
DT * 0.2 2016-08-09 CRAN (R 3.3.0)
evaluate 0.10 2016-10-11 CRAN (R 3.3.0)
foreach * 1.4.3 2015-10-13 CRAN (R 3.3.0)
formatR 1.4 2016-05-09 CRAN (R 3.3.0)
gam * 1.14 2016-09-10 CRAN (R 3.3.0)
htmltools 0.3.5 2016-03-21 CRAN (R 3.3.0)
htmlwidgets 0.7 2016-08-02 CRAN (R 3.3.0)
iterators 1.0.8 2015-10-13 CRAN (R 3.3.0)
jsonlite 1.1 2016-09-14 CRAN (R 3.3.0)
knitr 1.14 2016-08-13 CRAN (R 3.3.0)
magrittr 1.5 2014-11-22 CRAN (R 3.3.0)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
nnls * 1.4 2012-03-19 CRAN (R 3.3.0)
randomForest * 4.6-12 2015-10-07 CRAN (R 3.3.0)
Rcpp 0.12.7 2016-09-05 CRAN (R 3.3.0)
rmarkdown 1.1 2016-10-16 CRAN (R 3.3.0)
rsconnect 0.5 2016-10-17 CRAN (R 3.3.0)
stringi 1.1.2 2016-10-01 CRAN (R 3.3.0)
stringr 1.1.0 2016-08-19 CRAN (R 3.3.0)
SuperLearner * 2.0-19 2016-02-04 CRAN (R 3.3.0)
tibble 1.2 2016-08-26 CRAN (R 3.3.0)
tmle * 1.2.0-4 2014-03-09 CRAN (R 3.3.0)
withr 1.0.2 2016-06-20 CRAN (R 3.3.0)
yaml 2.1.13 2014-06-12 CRAN (R 3.3.0)
References
Bühlmann P, Drineas P, Laan M van der, Kane M. (2016). Handbook of big data. CRC Press.
Greenland S, Robins JM. (1986). Identifiability, exchangeability, and epidemiological confounding. International journal of epidemiology 15: 413–419.
Gruber S, Laan M van der. (2011). Tmle: An r package for targeted maximum likelihood estimation. UC Berkeley Division of Biostatistics Working Paper Series.
Hampel FR. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association 69: 383–393.
Laan M van der, Rose S. (2011). Targeted learning: Causal inference for observational and experimental data. Springer Series in Statistics.
Neugebauer R, Laan M van der. (2005). Why prefer double robust estimators in causal inference? Journal of Statistical Planning and Inference 129: 405–426.
Robins JM, Hernan MA, Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 550–560.
Rubin DB. (2011). Causal inference using potential outcomes. Journal of the American Statistical Association.
---
title: "TMLE step by step"
author: 'By: Miguel Angel Luque Fernandez miguel-angel.luque@lshtm.ac.uk'
date: "October 25th, 2016"
output:
  html_notebook:
    code_folding: show
    highlight: default
    number_sections: yes
    theme: journal
    toc: yes
    toc_float:
      collapsed: no
      smooth_scroll: yes
      toc_depth: 3
  html_document:
    toc: yes
  pdf_document:
    toc: yes
font-family: Risque
font-import: http://fonts.googleapis.com/css?family=Risque
csl: references/isme.csl
bibliography: references/bibliography.bib
---

<!--BEGIN:  Set the global options and load packages-->
```{r set-global-options, echo = FALSE}
knitr::opts_chunk$set(eval = TRUE, 
                      echo = TRUE, 
                      cache = FALSE,
                      include = TRUE,
                      collapse = FALSE,
                      dependson = NULL,
                      engine = "R", # Chunks will always have R code, unless noted
                      error = TRUE,
                      fig.path="Figures/",  # Set the figure options
                      fig.align = "center", 
                      fig.width = 7,
                      fig.height = 7)
#You need the suggested packages to run this notebook
#install.packages('tmle', 'SuperLearner', 'broom', 'DT')
require('tmle', 'SuperLearner', 'broom', 'DT')
```

#Introduction

During the last 30 years, the modern epidemiology has been able to identify some important drawbacks of the classic epidemiologic methods. Causal Inference [@robins2000] and the Neyma-Rubin Potential Outcomes framework [@rubin2011] have provided the theory and statistical methods needed to identify recurrent problems in observational epidemologic research, such as:

1. non collapsibility of the odds and hazard ratios,
2. impact of paradoxical effects due to conditioning on colliders,
3. left truncation,
4. prevalent cases,
5. selection bias related with the vague understanding of the effect of time on exposure and outcome and,
6. effect of time dependent confounding and mediators.
7. Etc.

To control for confounding, the classical epidemilogic methods require making the assumption that the effect measure is constant across levels of confounders included in the model.  

Alternatively, James Robins in 1986 demonstrated that using standardization, implemented through the use of the **G-formula**, allowed to obtain unconfounded marginal estimation of the causal average treatment effect (ATE) under causal nontestable assumptions [@robins1986]. The most commonly used estimator for a binary treatment effect is the risk difference or **ATE** = $\psi(P_{0})$.      

#The G-Formula

$$\psi(P_{0})\,=\,\sum_{w}\,\left[\sum_{y}\,P(Y=y\mid A=1,W=w)-\,\sum_{y}\,P(Y = y\mid A=0,W=w)\right]P(W=w)$$  

where,   

$$P(Y = y \mid A = a, W = w)\,=\,\frac{P(W = w, A = a, Y = y)}{\sum_{y}\,P(W = w, A = a, Y = y)}$$     
is the conditional probability distribution of Y = y, given A = a, W = w and, 

$$P(W = w)\,=\,\sum_{y,a}\,P(W = w, A = a, Y = y)$$ 

* Classical epidemilogic methods require making the assumption that the effect measure is constant across levels of confounders included in the model. However, **Standardization** allows us to obtain an unconfounded summary effect measure without requiring this assumption. The **G-formula** is a *generalization of standardization* [@robins1986].    

* The ATE can be estimated **non-parametrically** using the G-formula. However, the course of dimensionality in observational studies limits its estimation.   

* Hence, the estimation of the ATE using the G-formula relies mostly on **parametric modelling** assumptions and maximum likelihood estimation. The **correct model specification** in parametric modelling is crucial to obtain unbiased estimates of the true ATE [@rubin2011].  

However, Mark van der Laan and collaborators have developed a double-robust estimation procedure **to reduce bias against misspecification**. The targeted maximum likelihood estimation (TMLE) is a semiparametric, efficient substitution estimator [@van2011].    

**TMLE** allows for data-adaptive estimation while obtaining valid statistical inferencebased on the targeted minimum loss-based estimation and machine learning algorithms to minimize the risk of model misspecification [@van2011].      

1. **TMLE** is a general algorithm for the construction of double-robust, semiparametric, efficient substitution estimators. **TMLE** allows for data-adaptive estimation while obtaining valid statistical inference. 

2. **TMLE** implemtation uses the G-computation estimand (G-formula). Briefly, the **TMLE** algorithm uses information in the estimated exposure mechanism P(A|W) to update the initial estimator of the conditional expectaction of the outcome given the treatment and the set of covariates W, E$_{0}$(Y|A,W). 

3. The targeted estimates are then substituted into the parameter mapping $\Psi$. The updating step achieves a targeted bias reduction for the parameter of interest $\psi(P_{0})$ (the true target parameter) and serves to solve the efficient score equation. As a result, **TMLE** is a double robust estimator. 

4. **TMLE** it will be consistent for $\psi(P_{0})$ if either the conditional expectation E$_{0}$(Y|A,W) or the exposure mechanism P$_{0}$(A|W) are estimated consistently. When both functions are consistently estimated, the **TMLE** will be efficient in that it achieves the lowest asymptotic variance among a large class of estimators. These asymptotic properties typically translate into lower bias and variance in finite samples [@buh2016]. 

The general formula to estimate the ATE using the TMLE method:  

$$\psi TMLE,n = \Psi(Q_{n}^{*})= {\frac{1}{n}\sum_{i=1}^{n}\bar{Q}_{n}^{1}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{1}\left(0,\ W_{i}\right)}.  (1)$$
The efficient influcence curve (IC) based on Hampel seminal paper [@hampel1974] is applied for statistical inference using TMLE:

$$IC_{n}(O_{i})\ \ =\ \left(\frac{I\left(A_{i}=1\right)}{g_n\left(1\left|W_{i}\right)\right)}\ -\ \frac{I\left(A_{i}=0\right)}{g_n\left(0\left|W_{i}\right)\right)}\ \right)\left[Y_{i}-\bar{Q}_{n}^{1}\left(A_{i},W_{i}\right)\right]+\bar{Q}_{n}^{1}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{1}\left(0,\ W_{i}\right) - \psi TMLE,n. (2)$$
where the variance of the ATE:  

$$\sigma({\psi_{0}})=\sqrt{\frac{Var(IC_{n})}{n}}.  (3)$$

5. The procedure is available with standard software such as the **tmle** package in R  [@gruber2011]. 

6. The advantages of **TMLE** have been repeatedly demonstrated in both simulation studies and applied analyses [@van2011]. Evidence shows that **TMLE** provides the less unbiased ATE estimate compared with other double-robust estimators [@neu2005], [@van2011] such as the combination of regression adjustment with inverse probability of treatment weighting (IPTW-RA) and the augmented inverse probability of treatment weighting (AIPTW). The AIPTW estimation is a two step procedure with two equations (propensity score equation and mean outcome equation).  

7. To estimate the ATE using the AIPTW estimator one can set the estimation equation (EE) (4) equal to cero and use bootstrap to derive 95% confidence intervals (CI). However, solving the EE using the generalized method of moments (GMM), stacking both equations (propensity score and outcome), reduces the estimation and inference steps to only one. However, given that the propensity score in equation (4) can easily fall outside the range [0, 1] (if for some observations $g_{n}(1|W_{i})$ is close to 1 or 0) the AIPTW estimation can be unstable (near violation of the positivity assumption). This represents the price of not being a substitution estimator as **TMLE**.        

$$\psi_{0}^{AIPTW-ATE}\ \ =\ \frac{1}{n}\sum_{i=1}^{n}\left(\frac{I\left(A_{i}=1\right)}{g_n\left(1\left|W_{i}\right)\right)}\ -\ \frac{I\left(A_{i}=0\right)}{g_n\left(0\left|W_{i}\right)\right)}\ \right)\left[Y_{i}-\bar{Q}_{n}^{0}\left(A_{i},W_{i}\right)\right]+\frac{1}{n}\sum_{i=1}^{n}\bar{Q}_{n}^{0}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{0}\left(0,\ W_{i}\right). (4)$$

#Structural causal framework

##Direct Acyclic Graph
![](Figures/DAG.png)
Figure 1. Direct Acyclic Graph

#Causal assumptions 
Under the counterfactual framework the following assumptions have to be considered to estimate the $\psi(P_{0})$ (ATE) with a mondel for $P_{0}$ augmented with additional nontestatble cuasal assumptions [@rubin2011], [@van2011]:     

##CMI or Randomization 
($Y_{0},Y_{1}\perp$A|W) of the binary treatment effect (A) on the outcome (Y) given the set of observed covariates (W), where W = (W1,  W2, W3, … , Wk). 

##Positivity 
a ϵ A: P(A=a | W) > 0  
P(A=1|W=w) > 0 and P(A=0| W = w) > 0 for each possible w.  

##Consistency or SUTVA: 
The observed outcome value, under the observed treatment, is equal to the counterfactual outcome corresponding to the observed treatment for identical independent distributed (i.i.d.) variables.    

#TMLE flow chart 
**Source**: Mark van der Laan and Sherri Rose. Targeted learning: causal inference for observational and experimental dataSpringer Series in Statistics, 2011.  
![](Figures/tmle.png)
Figure 2. TMLE flow chart (Road map)

#Data generation

##Simulation 

In R we create a function to generate the data. The function will have as input number of draws and as output the generated observed data (ObsData) including the counterfactuals (Y1, Y0).  

The simulated data replicationg the DAG in Figure 1:  

1. Y: mortality binary indicator (1 death, 0 alive) 
2. A: binary treatment for emergency presentation at cancer diagnosis  (1 EP, 0 NonEP)    
3. W1: Gender (1 male; 0 female)  
4. W2: Age at diagnosis (0 <65; 1 >=65)  
4. W3: Cancer TNM classification (scale from 1 to 4)  
5. W4: Comorbidities (scale from 1 to 5)  

```{r}
#install.packages("broom")
options(digits=4)
generateData <- function(n){
  w1 <- rbinom(n, size=1, prob=0.5)
  w2 <- rbinom(n, size=1, prob=0.65)
  w3 <- round(runif(n, min=0, max=4), digits=3)
  w4 <- round(runif(n, min=0, max=5), digits=3)
  A  <- rbinom(n, size=1, prob= plogis(-0.4 + 0.2*w2 + 0.15*w3 + 0.2*w4 + 0.15*w2*w4))
  Y  <- rbinom(n, size=1, prob= plogis(-1 + A -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4))
  
  # counterfactual
  Y.1 <- rbinom(n, size=1, prob= plogis(-1 + 1 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4))
  Y.0 <- rbinom(n, size=1, prob= plogis(-1 + 0 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4))
  
  # return data.frame
  data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
}
set.seed(7777)
ObsData <- generateData(n=10000)
True_Psi <- mean(ObsData$Y.1-ObsData$Y.0);
cat(" True_Psi:", True_Psi)
Bias_Psi <- lm(data=ObsData, Y~ A)
cat("\n")
cat("\n Naive_Biased_Psi:",summary(Bias_Psi)$coef[2, 1])
Naive_Bias <- ((summary(Bias_Psi)$coef[2, 1])-True_Psi); cat("\n Naives bias:", Naive_Bias)
Naive_Relative_Bias <- (((summary(Bias_Psi)$coef[2, 1])-True_Psi)/True_Psi)*100; cat("\n Relative Naives bias:", Naive_Relative_Bias,"%")
```
##Data visualization

```{r}
# DT table = interactive
# install.packages("DT") # install DT first
library(DT)
datatable(head(ObsData, n = nrow(ObsData)), options = list(pageLength = 5, digits = 2))
```

#TMLE simple implementation

##Step 1: $Q_{0}$(A,**W**) 
Estimation of the initial probability of the outcome (Y) given the treatment (A) and the set of covariates (W), denoted as the $Q_{0}$(A,**W**). To estimate $Q_{0}$(A,**W**) we can use a standard logistic regression model: 

$$\text{logit}[P(Y=1|A,W)]\,=\,\beta_{0}\,+\,\beta_{1}A\,+\,\hat{\beta_{2}^{T}}W.$$ 

Therefore, we can estimate the initial probability as follows: 

$$\bar{Q}^{0}(A,W)\,=\,\text{expit}(\hat{\beta_{0}}\,+\,\hat{\beta_{1}}A\,+\,\hat{\beta_{2}^{T}}W).$$ 

The predicted probability can be estimated using the Super Learner library implemented in the R package “Super-Learner” (Ref) to include any terms that are functions of A or W (e.g., polynomial terms of A and W, as well as the interaction terms of A and W, can be considered). Consequently, for each subject, the predicted probabilities for both potential outcomes $\bar{Q}^{0}(0,W)$ and  $\bar{Q}^{0}(1,W)$ can be estimated by setting A = 0 and A = 1 for everyone respectively:
$$\bar{Q}^{0}(0,W)\,=\,\text{expit}(\hat{\beta_{0}}\,+\,\hat{\beta_{2}^{T}}W),$$
and,  
$$\bar{Q}^{0}(1,W)\,=\,\text{expit}(\hat{\beta_{0}}\,+\,\hat{\beta_{1}}A\,+\,\hat{\beta_{2}^{T}}W).$$

```{r}
ObsData <-subset(ObsData, select=c(w1,w2,w3,w4,A,Y))
Y  <- ObsData$Y
A  <- ObsData$A
w1 <- ObsData$w1
w2 <- ObsData$w2
w3 <- ObsData$w3
w4 <- ObsData$w4
m  <- glm(Y ~ A + w1 + w2 + w3 + w4, family=binomial, data=ObsData)
Q  <- cbind(QAW = predict(m),
            Q1W = predict(m, newdata=data.frame(A = 1, w1, w2, w3, w4)),
            Q0W = predict(m, newdata=data.frame(A = 0, w1, w2, w3, w4)))
Q0 <- as.data.frame(Q);mean(Q0$Q1W-Q0$Q0W)
```

##Step 2: $g_{0}(A,W)$
Estimation of the probability of the treatment (A) given the set of covariates (W), denoted as $g_{0}(A,W)$. We can use again a logistic regression model and to improve the prediction algorithm we can use the Super Learner library or any other machine learning estrategy:  

$$\text{logit}[P(A=1|W)]\,=\,\beta_{0}\,+\,\beta_{1}^{T}W.$$ 
Then, we estimate the predicted probability of P(A|W) = $\hat{g}(1,W)$ using:  

$$\hat{g}(1,W)\,=\,\text{expit}\,=\,(\hat{\beta_{0}}\,+\,\hat{\beta_{2}^{T}}W).$$ 

```{r}
g <- glm(A ~ w2 + w3 + w4, family = binomial)
g1w = predict(g, type ="response");summary(g1w)
```
##Step 3: HAW and $\epsilon$
This step aims to find a better prediction model targeted at minimising mean squared error (MSE) for the potential outcomes by using the so-called efficient IC estimation equation. For the ATE on step convergence is guaranteed given $\bar{Q}^{0}$ and $\hat{g}(1,W)$ the fluctuating parameter is modelled using a parametric working model to estimate the fluctuation parameters $\epsilon_{0}$ and $\epsilon_{1}$ as follows:

$$\bar{Q^{1}}(A,W)\,=\,\text{expit}\left[\text{logit}\left(\bar{Q^{0}}(A, W)\right)\,+\,\hat{\epsilon_{0}}H_{0}(A,W)\,+\,\hat{\epsilon_{1}}H_{1}(A,W)\right]  (5)$$
$$\bar{Q^{1}}(0,W)\,=\,\text{expit}\left[\text{logit}\left(\bar{Q^{0}}(A,W)\right)\,+\,\hat{\epsilon_{0}}H_{0}(A,W)\right]$$


$$\bar{Q^{1}}(1,W)\,=\,\text{expit}\left[\text{logit}\left(\bar{Q^{0}}(A,W)\right)\,+\,\hat{\epsilon_{1}}H_{1}(A,W)\right]$$
Where,
$$H_{0}(A,W)\,=\,\frac{I(A=0)}{\hat{g}(0|W)}\;\text{and},\;H_{1}(A,W)\,=\,\frac{I(A=1)}{\hat{g}(1|W)}$$ are referred to as clever covariates (note that $\hat{g}(A|W)$ is estimted from step 2).

The fluctuation parameters $(\hat{\epsilon}_{0}\,,\,\hat{\epsilon}_{1})$ are estimated using maximum likelihood procedures by setting $\text{logit}(\bar{Q^{0}}(A,W))$ as an offset in a intercept-free logistic regression with $H_{0}$ and $H_{1}$ as independent variables. Then, the estimated probability of the potential outcomes is updated by the substitution parameters $(\hat{\epsilon_{0}}\,,\,\hat{\epsilon_{1}})$. The substitution update is performed by setting A = 0 and A = 1 for each subject in the initial estimate probability of the potential outcomes $\bar{Q^{1}}(0,W)\,,\,\bar{Q^{1}}(1,W)$, as well as in the clever covariates $H_{0}(0,W)\;\text{and}\; H_{1}(1,W)$. 

For the ATE, the updated estimate of the potential outcomes only needs one iteration $\Psi(\bar{Q_{n}}^{*})$ from $\bar{Q}^{0}(A,W)\,=>\bar{Q^{1}}(A,W)$. Therefore, model (5) targets $E[\hat{Y}_{A=0}]\;\text{and}\; E[\hat{Y}_{A=1}]$ simultaneously by including both $H_{0}(A,W)\,\text{and}\,H_{1}(A,W)$ in the model.   

```{r}
#Model 5: Clever covariate and fluctuating/substitution paramteres
h <- cbind(A/g1w -(1-A)/(1-g1w), 1/g1w, -1/(1-g1w))
epsilon <- coef(glm(Y ~ -1 + h[,1] + offset(Q[,"QAW"]), family = binomial));epsilon
```

##Step 4: $\bar{Q_{n}}^{*}$
$$\psi TMLE,n = \Psi(Q_{n}^{*})= {\frac{1}{n}\sum_{i=1}^{n}\bar{Q}_{n}^{1}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{1}\left(0,\ W_{i}\right)}.  (1)$$
```{r}
Qstar <- plogis(Q + epsilon*h)
Psi <- mean(Qstar[,"Q1W"] - Qstar[,"Q0W"]);cat("TMLE_Psi:", Psi)
cat("\n TMLE.SI_bias:", abs(True_Psi-Psi))
cat("\n Relative_TMLE.SI_bias:",abs(True_Psi-Psi)/True_Psi*100,"%")
```
##Step 5: Inference
$$IC_{n}(O_{i})\ \ =\ \left(\frac{I\left(A_{i}=1\right)}{g_n\left(1\left|W_{i}\right)\right)}\ -\ \frac{I\left(A_{i}=0\right)}{g_n\left(0\left|W_{i}\right)\right)}\ \right)\left[Y_{i}-\bar{Q}_{n}^{1}\left(A_{i},W_{i}\right)\right]+\bar{Q}_{n}^{1}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{1}\left(0,\ W_{i}\right) - \psi TMLE,n. (2)$$

where the variance of the ATE:  

$$\sigma({\psi_{0}})=\sqrt{\frac{Var(IC_{n})}{n}}.  (3)$$

```{r}
Q <- as.data.frame(Q)
IC <- h[,1]*(Y-Q$QAW) + Q$Q1W - Q$Q0W - Psi;summary(IC)
n <- nrow(ObsData)
varHat.IC <- var(IC)/n; varHat.IC
#Psi and 95%CI for Psi
cat("\n TMLE.SI_bias:", abs(True_Psi-Psi))
cat("\n 95%CI:", c(Psi-1.96*sqrt(varHat.IC),  Psi+1.96*sqrt(varHat.IC)))
cat("\n Relative_TMLE.SI_bias:",abs(True_Psi-Psi)/True_Psi*100,"%")
```

#Comparison with AIPTW

$$\psi_{0}^{AIPTW-ATE}\ \ =\ \frac{1}{n}\sum_{i=1}^{n}\left(\frac{I\left(A_{i}=1\right)}{g_n\left(1\left|W_{i}\right)\right)}\ -\ \frac{I\left(A_{i}=0\right)}{g_n\left(0\left|W_{i}\right)\right)}\ \right)\left[Y_{i}-\bar{Q}_{n}^{0}\left(A_{i},W_{i}\right)\right]+\frac{1}{n}\sum_{i=1}^{n}\bar{Q}_{n}^{0}\left(1,\ W_{i}\right)-\bar{Q}_{n}^{0}\left(0,\ W_{i}\right). (4)$$

```{r}
AIPTW <- mean((h[,1]*(Y-Q$QAW))+(Q$Q1W-Q$Q0W)); AIPTW
cat("\n AIPTW_bias:", abs(True_Psi-AIPTW))
cat("\n Relative_AIPTW_bias:",abs(True_Psi-AIPTW)/True_Psi*100,"%")
```

#TMLE using the Super-Learner  

```{r}
#Q0
library(SuperLearner)
#Specify SuperLearner libraries
SL.library <- c("SL.glm","SL.step","SL.glm.interaction")
#Data frame with X with baseline covariates and exposure
X <- subset(ObsData, select=c(A, w1, w2, w3, w4))
n <- nrow(ObsData)
#Create data frames with A=1 and A=0
X1<-X0<-X
X1$A <-1
X0$A <-0
#Create new data by stacking
newdata <- rbind(X,X1,X0)
#Call superlearner
Qinit <- SuperLearner(Y=ObsData$Y, X=X, newX=newdata, SL.library=SL.library, family="binomial")
Qinit
#Predictions
#Pred prob of survival given A, W
QbarAW <- Qinit$SL.predict[1:n]
#Pred prob of surv for each subject given A=1 and w
Qbar1W <- Qinit$SL.predict[(n+1):(2*n)]
#Pred prob of surv for each subject given A=0 and w
Qbar0W <- Qinit$SL.predict[(2*n+1):(3*n)]
#Simple substitution estimator Psi(Q0)
PsiHat.SS <- mean(Qbar1W-Qbar0W);PsiHat.SS
```

```{r}
#Step 2 g_0(A|W) with SuperLearner
w <- subset(ObsData, select=c(w1,w2,w3,w4))
gHatSL <- SuperLearner(Y=ObsData$A, X=w, SL.library=SL.library, family = binomial)
gHatSL;mean(gHatSL)
#Generate the pred prob of A=1 and, A=0 given covariates
gHat1W <- gHatSL$SL.predict
gHat0W <- 1-gHat1W
#Step 3: Clever covariate
HAW <- as.numeric(ObsData$A==1)/gHat1W - as.numeric(ObsData$A==0)/gHat0W;mean(HAW)
H1W <-  1/gHat1W
H0W <- -1/gHat0W
```

```{r}
#Step 4: Substitution estimaiton Q* of the ATE.
logitUpdate <- glm(ObsData$Y ~ -1 + offset(qlogis(QbarAW))+HAW, family='binomial')
eps <- logitUpdate$coef;eps
#Calculating the predicted values for each subject under each txt
QbarAW.star <- plogis(qlogis(QbarAW)+eps*HAW)
Qbar1W.star <- plogis(qlogis(Qbar1W)+eps*H1W)
Qbar0W.star <- plogis(qlogis(Qbar0W)+eps*H0W)
PsiHat.TMLE.SL <- mean(Qbar1W.star) - mean(Qbar0W.star)
cat("PsiHat.TMLE.SL:", PsiHat.TMLE.SL)
cat("\n PsiHat.TMLE.SL_bias:", abs(True_Psi-PsiHat.TMLE.SL))
cat("\n Relative_PsiHat.TMLE.SL_bias:",abs(True_Psi-PsiHat.TMLE.SL)/True_Psi*100,"%")
```

#R-TMLE

```{r}
library(tmle)
w <- subset(ObsData, select=c(w1,w2,w3,w4))
tmle <- tmle(Y, A, W=w)
cat("TMLER_Psi:", tmle$estimates[[2]][[1]],";","95%CI(", tmle$estimates[[2]][[3]],")")
cat("\n TMLE_bias:", abs(True_Psi-tmle$estimates[[2]][[1]]))
cat("\n Relative_TMLE_bias:",abs(True_Psi-tmle$estimates[[2]][[1]])/True_Psi*100,"%")
```

#R-TMLE improving prediction

```{r}
SL.TMLER.Psi <- tmle(Y=Y, A=A, W=w, family="binomial", 
    Q.SL.library = c("SL.glm", "SL.glm.interaction", "SL.gam", "SL.randomForest"),
    g.SL.library = c("SL.glm", "SL.glm.interaction", "SL.gam", "SL.randomForest"))

cat("SL.TMLER.Psi:", SL.TMLER.Psi$estimates[[2]][[1]],";","95%CI(", SL.TMLER.Psi$estimates[[2]][[3]],")")
cat("\n SL.TMLER.Psi_bias:", abs(True_Psi-SL.TMLER.Psi$estimates[[2]][[1]]))
cat("\n Relative_SL.TMLER.Psi_bias:",abs(True_Psi-SL.TMLER.Psi$estimates[[2]][[1]])/True_Psi*100,"%")
```

#Thank you  
Thank you for participating in this tutorial.  
If you have updates or changes that you would like to make, please send <a href="https://github.com/migariane/MALF" target="_blank">me</a> a pull request.
Alternatively, if you have any questions, please e-mail me.  
**Miguel Angel Luque Fernandez**  
**E-mail:** *miguel-angel.luque at lshtm.ac.uk*  
**Twitter** `@WATZILEI`  

# Session Info 
```{r session-info, results ='markup'}
devtools::session_info()
```
# References 
